{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _ip_userguide:\n",
    "\n",
    "IP Addresses\n",
    "============"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Introduction\n",
    "------------\n",
    "\n",
    "The function :func:`clean_ip() <dataprep.clean.clean_ip.clean_ip>` cleans a column containing IP addresses, and standardizes them in a given format. The function :func:`validate_ip() <dataprep.clean.clean_ip.validate_ip>` validates either a single IP address or a column of IP addresses, returning `True` if the value is valid, and `False` otherwise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently IPv4 and IPv6 are supported as valid input.\n",
    "\n",
    "The IP addresses can be converted into any of the following desired formats:\n",
    "* `compressed`: provides a compressed version of the ip address,\n",
    "* `full`: provides full version of the ip address,\n",
    "* `binary`: provides binary representation of the ip address,\n",
    "* `hexa`: provides hexadecimal representation of the ip address,\n",
    "* `integer`: provides integer representation of the ip address,\n",
    "* `packed`: provides packed binary representation of the ip address.\n",
    "\n",
    "The default output format is `compressed`.\n",
    "\n",
    "Invalid parsing is handled with the `errors` parameter:\n",
    "\n",
    "* \"coerce\" (default): invalid parsing will be set to NaN\n",
    "* \"ignore\": invalid parsing will return the input\n",
    "* \"raise\": invalid parsing will raise an exception\n",
    "\n",
    "After cleaning, a **report** is printed that provides the following information:\n",
    "\n",
    "* How many values were cleaned (the value must have been transformed).\n",
    "* How many values could not be parsed.\n",
    "* A summary of the cleaned data: how many values are in the correct format, and how many values are NaN."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### An example dataset containing ip addresses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df = pd.DataFrame({\n",
    "    \"ips\": [\n",
    "        \"00.000.0.0\", \"455.0.0.0\", None, 876234, {}, \"00.12.021.255\",\n",
    "        \"684D:1111:222:3333:4444:5555:6:77\", b'\\xc9\\xdb\\x10\\x00'\n",
    "    ]\n",
    "})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Default `clean_ip`\n",
    "\n",
    "By default, `clean_ip` will clean ip addresses in IPv4 and IPv6 and output them in the compressed format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dataprep.clean import clean_ip\n",
    "clean_ip(df, \"ips\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Input formats"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section demonstrates the input parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `ipv4`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Will parse only IPv4 addresses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", input_format=\"ipv4\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `ipv6`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Will parse only IPv6 address. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", input_format=\"ipv6\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `auto` (default parameter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Will parse both IPv4 and IPv6 addresses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", input_format=\"auto\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Output formats"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `compressed` (default)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", output_format=\"compressed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `full`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", output_format=\"full\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `binary`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", output_format=\"binary\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `hexa`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", output_format=\"hexa\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `integer`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", output_format=\"integer\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `packed`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", output_format=\"packed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. `errors` parameter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `coerce` (default)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", errors=\"coerce\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `ignore`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_ip(df, \"ips\", errors=\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. `validate_ip()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`validate_ip()` returns `True` if the input is a valid IP, otherwise `False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dataprep.clean import validate_ip\n",
    "\n",
    "print(validate_ip(\"455.0.0.0\"))\n",
    "print(validate_ip({}))\n",
    "print(validate_ip(\" \"))\n",
    "print(validate_ip(\"0.0.0.0\"))\n",
    "print(validate_ip(\"684D:1111:222:3333:4444:5555:6:77\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_2 = validate_ip(df[\"ips\"])\n",
    "df_2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
